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I .   INTRODUCTION 

The  purpose  of  this  paper  is  twofold:   l)  To  present  a  "brief 
survey  of  the  literature  on  parallel  processing.   2)  To  present  a 
proposal  for  a  flowchart  programming  system  which  provides  facilities 
for  the  specification  of  parallel  execution  of  tasks  within  a  job  and 
of  computations  within  a  task. 

The  literature  on  parallel  processing  abounds  in  terminology 
whose  definitions  vary,  reflecting  the  different  points  of  view  taken 
by  the  authors.   In  this  paper  a  working  set  of  definitions  is  first 
introduced.   By  tying  the  important  concepts  of  parallel  processing  to 
this  set  of  definitions  the  ambiguities  reflected  in  the  literature 
have  been  avoided  in  an  attempt  to  emphasize  points  of  agreement  without 
dwelling  on  the  differences  in  definitions. 

Investigations  into  task  and  subtask  parallelism  are  going 
in  two  directions.   The  first  is  the  introduction  of  programming 
facilities  that  encourage  parallelism  to  be  detected  and  used  at  the 
coding  level.   The  other  is  the  implementation  of  algorithms  at  the 
compiling  level  to  investigate  potential  parallelism  in  sequentially 
coded  tasks . 

Programming  techniques  which  constrain  problem  solutions  to 
a  sequential  order  have  merely  hidden  any  inherent  parallelism  in  these 
solutions.   By  introducing  new  programming  techniques  which  relax 
sequential  constraints,  the  programmer  can  re-introduce  the  inherent 
parallelism  in  a  problem  solution  and  use  it  to  find  algorithms  which 
are  faster  and  more  naturally  follow  thought  processes.   The  proposal  in 
this  paper  is  for  a  programming  system  which  allows  programmer  specifi- 
cation of  parallelism. 


II  .   SURVEY 

The  following  framework  and  definitions  for  a  general 
discussion  of  parallel  processing  have  been  chosen  as  practical 
vehicles  for  this  paper.   They  do  not  purport  to  cover  all  the 
nuances  of  a  term  which  is  used  freely  to  describe  many  activities. 

A.   Definitions  and  Background 
A  process,   P,  is  defined  to  be  a  single  executable  unit 
that  once  started  may  be  completed.   No  other  process  is  initiated 
by  P's  action,  only  possibly  by  P's  completion.   A  processor   is 
that  which  executes  the  process  P,  and  parallel  processors   are  a 
set  of  processors  executing  processes  simultaneously. 

These  definitions  can  be  applied  to  computations  on  three 
levels  of  sophistication. 

1.  The  most  fundamental  process  in  a  computer  is  signal 
transmission.   The  processors  are  the  gates  which 
transmit  the  signals.   Parallel  processors  are  a  set 
of  gates  activated  simultaneously  as  the  result  of  a 
former  gating.   Bit  parallel  adders  are  an  example  of 
parallel  processors. 

2.  At  the  next  level,  a  process  is  identified  as  an 
instruction  cycle  comprising  decoding  and  execution. 
The  processor  is  the  central  processing  unit  (CPU) 
composed  of  the  control  unit  and  the  arithmetic /logic 
units  (ALU).   By  duplicating  the  CPU  or  perhaps  just 
the  ALU,  it  is  possible  to  execute  instructions 
simultaneously. 


3.   The  highest  level  process  is  identified  with  an 
instruction  stream  which  obeys  the  constraints  of 
the  definition  of  a  process.   This  could  he  as 
elementary  as  a  single  instruction.   The  processor 
is  a  CPU  or  an  I/O  channel  and  a  memory  for  storing 
the  instruction  stream.   The  channel  considered  is 
capable  of  decoding  and  executing  its  own  channel 
program.   Although  a  channel  requires  the  CPU  for 
its  initiation  it  is  then  capable  of  completing  the 
action  initiated  independently  of  the  CPU.   Parallel 
processors  could  be  multiple  CPU's  or  channels 
sharing  a  common  memory.  Multiprooessi-ng   is  the 
term  most  generally  used  to  describe  multiple  CPU's 
sharing  a  common  memory. 
The  primary  inpetus  behind  parallel  processing  has  been  the 
search  for  methods  of  increasing  execution  speed,  and  secondarily,  for 
ways  to  efficiently  use  increasingly  more  sophisticated  hardware.   The 
second  statement  above  is  open  to  controversy.   Some  would  say  that 
sophisticated  hardware  is  a  result  of  meeting  the  needs  of  rapidly 
developing  concepts  in  software.   More  correctly,  the  two  probably 
develop  with  little  communication  of  future  goals.   However,  without 
sophisticated  hardware,  the  development  of  software  for  parallel 
processing  discussed  in  this  paper  would  be  a  futile  pursuit. 
Pre-von  Neumann  computers  were  capable  of  executing 
arithmetic  instructions  in  parallel.   This  feature  was  abandoned  with 
the  advent  of  stored  program  computers  because  the  overall  gain  in 
speed  was  felt  to  outweigh  the  difficulties  in  adapting  electronic 


circuits  to  this  technique.   Early  attempts  to  rectify  this  loss 
resulted  in  the  parallel  gating  concepts  discussed  above.   Some 
computers  were  then  designed  with  an  advanced  control  which  could 
scan  ahead  for  neighboring  arithmetic  instructions  that  could  be 
performed  simultaneously  using  a  second  ALU.   The  next  attempts  to 
increase  speed  were  found  in  the  proposals  for  Solomon  I  and 
Solomon  II,  and  later  in  ILLIAC  IV,  i.e.  the  replication  of  ALU's 
for  simultaneous  execution  of  a  single  instruction  on  different  data. 

In  the  input-output  world,  the  advent  of  faster  memories 
found  slow  mechanical  devices  to  be  holding  up  execution  time,  so 
techniques  were  designed  to  make  peripheral  equipment  independent 
of  internal  computation.   Internal  processing  could  continue  while 
awaiting  completion  and  notification  from  slower  equipment.   Time- 
sharing, in  general,  will  not  be  called  parallel  processing  since 
only  one  process  is  in  control  at  any  one  time.   However,  channels 
capable  of  decoding  and  executing  their  own  channel  programs  can 
execute  in  parallel  with  a  CPU,  so  this  will  be  called  parallel 
processing  even  if  used  to  implement  a  timesharing  system. 

The  rest  of  this  discussion  will  be  confined  to  one  particular 
type  of  parallel  processing  included  in  category  3.   This  is  the 
execution  of  instruction  streams  which  are  logically  related  in  that 
they  are  initiated  simultaneously  by  another  single  process. 
Specifically,  the  topics  of  interest  will  be  coding,  compiling  and 
technical  considerations. 

Two  different  schools  of  thought  have  arisen  concerning 
parallelism  at  the  source  program  level. 


B.   Explicit  Parallelism 
Those  of  the  first  school  propose  that  potential 
parallelism  be  recognized  "by  the  analyst  and  coded  in  a  source 
language  that  allows  explicit  declaration  of  parallelism.   Most 
existing  widely  accepted  languages  have  no  facility  for  expressing 
parallelism.   This  deficiency  has  been  recognized  by  the  designers 
of  PL/I  who  included  the  'TASK'  option  with  the  'CALL'  statement  to 
initiate  the  simultaneous  execution  of  a  parallel  task  (Radin  and 

Rogoway    ) . 

[22] 
Tesler  and  Enea     have  proposed  Single  Assignment 

Languages  in  which  all  statements  are  assignments  and  no  variable 

can  be  assigned  a  value  by  more  than  one  statement.   No  explicit 

indication  of  sequence  or  concurrence  is  needed  for  the  assignments 

would  define  dependence  relations  which  in  turn  would  define  order 

of  execution. 

Another  approach  to  expressing  parallelism  is  Sutherland's 

[21] 
graphical  specification  of  computer  procedures     in  which  parallelism 

can  be  initiated  by  control  or  data  flow  lines  simultaneously  activating 

nodes  of  a  graph  representing  executable  operations.   More  will  be  said 

later  of  Sutherland's  work. 

The  most  common  solutions  in  lieu  of  a  new  language  have 
been  the  proposals  for  implementing  existing  languages,  generally  ALGOL 
or  FORTRAN,  with  statements  necessary  for  explicitly  declaring 
parallelism.   The  general  ideas  proposed  are: 

1.   Some  explicit  declaration  of  the  initiation  of  simul- 
taneous paths  is  necessary.   The  'FORK'  statement, 

T6 1 

attributed  to   Conway        ,    is   the  most  widely  known 


suggestion,  and  the  concept  is  generally  the  same 
regardless  of  the  various  other  names  given  to  it. 
Two  types  of  'FORK'  are  generally  recognized. 

la.   The  first  'FORK'  proposed  (see  Conway    and  Opler    ) 
establishes  parallel  paths  which  eventually  converge  to 
a  point  in  the  program  beyond  which  no  execution  can 
proceed  until  all  the  activated  paths  have  been 
completed.   The  usual  implementation  is  a  counter 
established  by  the  'FORK'  which  is  decremented  by  each 
path  on  completion,  and  only  the  processor  which  sets 
the  counter  to  zero  will  continue  processing  beyond  the 
convergence  point,  the  other  processors  having  been 
released. 

lb.   Anderson    ,  and  later  Gosden    ,  proposed  the 

existence  of  circumstances  in  which  parallel  paths  need 
not  converge.   So,  a  modified  'FORK'  is  needed  to 
establish  parallel  paths  which  just  release  their 
processors  at  the  end. 
2.   'FORK'  statements  initiating  paths  that  will  converge 
also  reference  the  point  in  the  program  to  which  they 
converge.   This  is  normally  called  the  'JOIN'  statement, 
which  is  the  location  of  the  counter,  and  acts  as  a  'WAIT' 
statement  to  which  all  the  paths  branch.   The  last 
processor  to  branch  to  the  'JOIN'  will  then  go  to  the 
next  sequential  instruction. 
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3.   Some  statement  is  also  necessary  to  terminate  parallel 
paths  that  do  not  converge.   This  is  generally  called 
an  'IDLE'  statement  which  releases  the  processor,  and 
initiates  no  other  action. 

k.      Parallel  paths  conceivably  will  wish  to  use  common 
data,  so  some  statement  is  needed  to  prevent  the 
destructive  manipulation  of  data  by  other  processes 
while  one  process  wishes  to  have  exclusive  use  of  that 
data.   Such  statements  are  'OBTAIN'  or  'LOCK',  and 
their  counterparts  to  release  exclusive  use  are 
•RELEASE'  or  'UNLOCK'.   Dennis  and  Van  Horn1 ' J  have 
noted  that  it  is  important  that  once  exclusive  use  is 
gained  that  it  not  be  held  indefinitely,  such  as  might 
happen  if  an  interrupt  were  to  break  the  flow  of  the 
process,  or  the  process  were  to  go  into  an  infinite 
loop.   Either  interrupts  could  be  inhibited  during 
'LOCK' -'UNLOCK'  separations  or  a  timing  limit  could  be 

set  on  the  span  of  the  ' LOCK' -'UNLOCK' . 

[21+] 
Wirth     has  proposed  a  procedure  declaration  called 

'CRITICAL  SECTION'  for  instruction  streams  common  to  several  processes 

which  must  not  be  executed  simultaneously.   This  would  act  like  'LOCK' 

and  prevent  other  processes  from  entering  that  instruction  stream  while 

it  is  active. 

If  explicit   declaration  of  parallelism  is   going  to  be   useful, 

computational  techniques  will  have  to  be   re-examined  to  re-introduce   any 

parallelism  that  was   abandoned  in   developing  serial   algorithms    for  a 

computer.      Several   interesting  algorithms    for  parallel  methods   in 


numerical  analysis  have  been  proposed  (Nievergelt  '   ,  Pease  '   , 
Gilmore     ,  and  Miranker  and  Liniger   '  ) .   The  purpose  here  is  not 
to  survey  this  work,  but  merely  to  point  out  that  progress  is  being 
made  in  the  search  for  new  algorithms.   This  work  further  emphasizes 
the  need  for  methods  of  conveying  this  new  parallelism  to  a  computer. 


C.   Implicit  Parallelism 
Those  of  the  other  school  of  thought  feel  that  coding  is 
difficult  enough  and  already  too  error  prone.   Further,  the  programmer 
should  not  be  saddled  with  the  added  burden  of  recognizing  parallelism 

and  insuring  determinancy  in  his  parallel  procedures.   In  his  paper 

[2] 
Bernstein    proposes  a  set  of  sufficient  conditions  based  on  data 

reads  and  writes  for  determining  parallelism  at  the  subprogram  level. 

[3] 
Bingham,  et  al    ,  have  written  an  algorithm  for  determining  parallelism 

in  programs  based  on  I/O  set  intersections  and  any  known  essential 

serial  order. 

In  his  paper,  Stone     describes  a  one-pass  algorithm  for 

compiling  arithmetic  expressions  such  that  the  resulting  expression  is 

inherently  parallel.   He  assumes  the  use  of  a  CPU  with  several 

independent  arithmetic  units  of  the  type  discussed  in  category  2  above. 

The  resultant  code  would  yield  a  set  of  neighboring  instructions  which 

a  scan  ahead  feature  could  recognize  as  independent  and  distribute  to 

the  arithmetic  units.   This  paper  is  pertinent  to  this  discussion  because 

of  the  obvious  extension  to  the  multiple  CPU  environment.   The  compiler 

could  compile  simulated  'FORK'S  and  'JOIN's  to  divide  the  work  of 

computing  the  arithmetic  expressions  among  several  CPU's. 


D.   Technical  Considerations 
The  following  considerations  are  important  at  the  time 
of  execution  no  matter  what  the  level  at  which  parallelism  is 
implemented. 

1.  The  Critical  Section.   Instruction  streams  executing 
in  parallel  may  have  access  to  the  same  facilities, 
whether  storage  or  I/O  and  the  time  that  this  access 
is  desired  is  termed  a  critical  section.   The  key 
problem  is  to  assure  that  no  two  parallel  paths  enter 
their  critical  sections  relating  to  the  same  facilities 
simultaneously. 

2.  Queuing.   Closely  related  to  the  problem  above  is  the 
queuing  of  processes  for  execution  of  their  critical 
sections.   It  is  important  that  no  one  process  be 

T  Pi  1 

delayed  indefinitely.   Dijkstra    was  the  first  to 

propose  an  algorithm  for  queuing  processes  with  common 

[12] 
critical  sections.   Knuth     later  pointed  out  that 

although  not  all  processes  would  be  blocked  indefinitely 
by  Dijkstra' s  proposal,  one  single  process  could  be 
blocked  indefinitely.   Knuth  suggested  a  modification  to 
the  algorithm  which  hopefully  would  clear  up  this  over- 
sight.  Both  of  these  proposals  are  instructive  for 
anyone  concerned  with  the  implementation  of  parallel 
processing  software. 

3.  Determinancy.  When  searching  for  potential  parallelism 
at  the  compiler  level,  basic  rules  about  data  reads  and 
writes  (which  are  somewhat  restrictive)  must  be  assumed. 
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If  an   instruction   stream,   M,   is   changing  the  value   of 
a  data  variable,    C,    and  another  instruction  stream,   N, 
is   just   reading  C,   then  the   algorithms   should  assume 
that  the   order  of  M  relative  to  N  must  be  maintained. 
The   algorithms   cannot   anticipate  that  the   order  of  M 
and  H  perhaps  has   no   relevance.      If  the  programmer 
were  to   code  the   algorithm  explicitly  specifying 
parallelism,   two  paths,  M'    and  N'    corresponding  to 
M  and  N   respectively,    could  conceivably  be   executed 
in  parallel.      The    'LOCK' -'UNLOCK'    feature   must  be 
provided,   however,   to   assure  that  M'    does   not 
destructively  manipulate   C  while  N'    is   using  C,    and 
N1    does   not   attempt   to  access    C  while   it   is  being 
modified  by  M' .      In   a  sense   the   sub-instruction  streams 
of  M'    and  N'    concerned  with   C   are  not   so  much  parallel 
as    commutative.      The   time   relationship  of  the  executions 
of  the  two   sub-instruction  streams   is   unimportant   as   long 
as   the   executions    are  not   simultaneous,    so  the    'LOCK' -'UNLOCK' 
feature  prevents   true  parallelism. 

To   assure   determinancy   for  implicit  parallelism  is   to 
assume  that   the   output   at  the   end  of  a  set   of  parallel 
paths   is   the   same   regardless   of  the  time   relationship  of 
the   executions   of  the   individual  paths   as   long  as   the   serial 
order  within  the  paths   is   maintained.      Determinancy  in 
explicit  parallelism  is   the   responsibility   of  the  programmer, 
and  interpretation  of  its   meaning  may  be   very  broad  for  a 
particular  problem  solution. 
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E.      Advantages 
Parallelism  of  the  type   discussed  above  has   several 
advantages   in  both  mult i -CPU  and  single-CPU  environments. 

1.  Multi-CPU's .      If  there   are  not  enough   active  tasks 
to  use   all  the   CPU's   then  parallel  paths  within   a 
single  task   can  use   otherwise   idle  equipment.      If 

there   are  high  priority  jobs  with  parallel  specifications 
within  the  tasks    comprising  the   job  then   several   CPU's 
can  be   committed  to  that   job  to  decrease  the  turn- 
around time.      By  running  batch  processes   serially  and 
dedicating  several  processors  to  one   job,   the   turn- 
around time    for  a  number  of  large   jobs    can  be   improved. 
System  overhead  time   can  be   saved  in  overlays   of  data 
and  program  segments.      As   an  economic   factor,   the  number 
of  memory  hours    (amount   of  memory  occupied  times   the 
length   of  time   of  occupation)   taken  by   a  single   job 
can  be   reduced. 

2.  Single-CPU' s .      In   a  timesharing  environment   one   task 
with  outstanding  priority   could  have   alternate  path 
specifications    for  I/O  wait  time   rather  than   relinquishing 
control  to   a  lower  priority  task.      Again  overhead  time 
could  be   saved  by   reducing  the  number  of  overlays. 
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III .   PROPOSAL 

A.   Flowchart  Programming  System 
The  following  proposal  for  a  programming  system  which 

provides  for  explicit  declaration  of  parallelism  is  an  extension  of 

[19] 
work  done  by  Fontaine  K.  Richardson     on  the  Computer- Aided 

Programming  System. 

The  systems  analyst  uses  the  flowchart  as  a  means  of 
explicitly  showing  the  logic  flow  between  tasks  comprising  a  job  and 
the  logic  flow  within  a  task.   From  the  flowchart  for  a  task  a 
programmer  produces  coding,  and  the  most  important  feature  of  the 
flowchart  is  lost,  the  explicit  indication  of  control  flow.   The 
flowchart  is  seldom  updated  as  the  coding  is  debugged  or  altered  and 
an  important  documentation  tool  is  abandoned. 

Richardson's  work  consists  of  utilizing  the  symbols  of  a 
flowchart,  the  nodes  and  branches,  as  an  extension  of  a  higher  level 
programming  language  called  FPL/I  to  form  a  source  programming 
language.   The  flowchart  structure  is  used  to  show  the  control  flow 
in  the  program;  executable  statements  written  in  FPL/I  are  associated 
with  the  nodes,  and  the  arrows  determine  the  sequence  of  execution  of 
the  texts  associated  with  the  nodes.   The  flowchart  and  the  program 
statements  are  constructed  on  an  interactive  graphics  terminal  using 
a  light  pen,  and  are  then  converted  to  digital  information  either  for 
storage  or  immediate  transmission  to  a  flowchart  interpretor  for  online 
execution . 

The  nodes  and  branch,  called  collectively  symbols,  used  by 
Richardson  are: 


Nodes ; 


Process 
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Decision 


Connector 


Branch: 


Arrow 


The  symbol  types  are  used  to  reconstruct  the  flowchart 
internally  in  a  linked  list  format.   The  executable  statements  are 
decoded  and  executed  (interpreted)  one  by  one  as  control  flows  from 
symbol  to  symbol.   An  entry  point  is  a  process  symbol  with  no 
previous  linkage  whose  associated  text  is  taken  as  the  entry  point 
name.   An  exit  point  is  a  symbol  with  no  succeeding  linkage.   Decision 
branching  is  handled  by  the  setting  of  a  condition  register  following 
the  execution  of  FPL/I  test  statements.   The  condition  register  is 
compared  with  the  text  associated  with  the  arrows  leading  from  a 
decision  symbol  to  determine  which  branch  to  take.   (See  Figure  1.). 


Ik 


The  advantages  of  this  programming  system  are  summarized 
in  the  following: 

1.  Analysis,  coding,  and  documentation  are  combined  into 
one  step.   As  an  analysis  of  the  problem  solution  is 
made  in  the  format  of  a  flowchart,  the  coding  and  a 
part  of  the  documentation  are  created  automatically. 
At  any  time  that  the  program  is  later  changed  due  to 
changes  in  specification,  the  coding  and  documentation 
are  automatically  updated. 

2.  Debugging  facilities  are  provided  that  allow  the 
programmer  to  trace  the  execution  of  the  flowchart  by 
blinking  the  symbols  as  they  are  executed.   With  the 
explicit  control  flow  before  him,  the  programmer  can 
more  easily  determine  the  sources  of  his  errors.   As 
errors  are  detected  and  the  program  is  changed,  the 
flowchart  is  kept  up  to  date. 

FPL/I  provides  the  programmer  with  certain  basic  machine 
functions  called  primitives  which  include  arithmetic  capabilities. 
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Additional  functions  may  be  provided  by  means  of  flowcharts  defined 
by  the  programmer.  Programmer  defined  functions  are  then  used  just 
as  the  primitives  are  used. 

The  extension  of  this  programming  system  to  parallel 
applications  follows  naturally.   The  counterpart  of  the  'FORK'  to 
several  paths  can  be  implemented  by  multiple  branches  from  a  process, 
decision,  or  connector  symbol.   (See  Figure  2.).   In  multiple 
branching  from  a  decision  symbol  the  only  restriction  is  that  the 
multiple  paths  be  labeled  with  the  same  branch  condition. 


PROCESS 


0- 


■Cond'       ^DECISION  XC°nd-    X- 


Figure   2, 


The  counterpart  of  the  join  statement  will  require  the 
introduction  of  a  new  symbol,  the  conjunction,  which  will  signify  to 
the  flowchart  interpreter  that  several  paths  will  converge  to  this 
symbol.   Control  is  not  to  pass  beyond  the  conjunction  until  all  the 
converging  paths  have  been  completed.   The  number  of  parallel  paths 
that  must  be  completed  is  indicated  by  the  number  of  arrows  leading 
to  the  conjunction.   (See  Figure  3.). 
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Figure   3. 

To   control  the   use   of  common   data  at   critical   access   times, 
two  primitives  will  be  provided.      L0CK(varl,var2 , . . )   will  make  the 
data  variables   listed  exclusive  property  of  the  branch  issuing  the 
LOCK.      UNLOCK( varl,var2 , . . )    is   the   corresponding  primitive   to   release 
previously  locked  variables. 

B.      Another  Graphical  Technique 
A  different   approach  to   graphical  specification   of  procedures 

is    considered  by  W.    R.    Sutherland  in  his   thesis,    "On-Line   Graphical 

[21] 
Specification  of  Computer  Procedures."  In   contrast   to  the  work 

described  above  where  branches  between  nodes   represent   control   flow, 

in  Sutherland's    graphs  branches   represent  both   control   and  data  flow. 

A  single   symbol   is    allowed  within   a  node  which   represents   either  a 

primitive   function  or  a  function  previously  defined  in  terms   of  other 

functions    and  primitives.      Both  primitives    and  functions   operate   on   data 

received  over  the   input  branches. 

In  Figure   h  the  node   represents   the   function   addition  with 

two   inputs,   A  and  B.      Once   A  and  B   are   defined  the   function   can  be 

performed,    and  the   result   is   sent  to  the   next  node.      Nodes   are   activated 

only  when  the   inputs   are    defined.      To  prevent   continuous   activation,    an 

indication  of  change   is   also  transmitted  with  the   data  value.      Data 


values  can  also  be  destroyed  after  use  by  allowing  each  branch  to 
optionally  have  a  visable  data  destruction  marker.   (See  Figure  5.). 
The  marker  indicates  that  the  input  is  set  to  an  undefined  state 
after  its  value  is  used. 


IT 


A^ 


B. 


J^+B 


A+B 


Fi  gure  k 


Figure  5. 


The  use  of  explicit  control  flow  introduces  one  or  more 
branches  other  than  the  data  branches  as  inputs  to  a  node.   These 
control  branches  pass  no  data  values,  just  a  signal  that  allows  the 
node  to  operate  if  its  data  values  are  defined. 

In  parallel  operations  forks  are  realized  by  the  splitting 
of  data  or  control  lines  to  several  operator  nodes.   These  nodes  are 
activated  if  the  current  condition  of  the  other  input  lines  is 
'defined'.   An  example  is  given  in  Figure  6. 
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Figure  6. 

No  mention  of  the  problems  of  simultaneous  accessing  of 
data  at  critical  time  is  made  by  Sutherland  because  data  values  are 
associated  with  branches  rather  than  with  fixed  locations  in  memory, 


C.   Comparison 

In  the   following  paragraphs    a  comparison   of  the  two   graphic 
techniques,   the   flowchart   and  Sutherland's   graph,    is   made   using  an 
example  with  inherent  parallelism. 

In  Figure   7   a  flowchart   is   given  of  the  method  of  false 
position   for   finding  a  zero  of  a   function  F  using  the  proposed  flowchart 
programming  system.      G(A,B,C,D)    is  previously   defined  as   a  flowchart 
function  that    computes   the   intersection   of  the   line  through  the  points 
(A,C)    and   (B,D)   with  the   x-axis .      The   flowchart  of  G  is    given   in 
Figure   8.      F(X)    is   a  previously   defined  flowchart   function   for 
determining  the   value   of  the   function  F   for  the  parameter  X.      In 
Figure   9  the   same   example   in  its   parallel   form  has  been   represented  as 
a  Sutherland  Graph.      G  and  F  are  previously   defined  graphs    (not   shown). 
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Several  operator  nodes  have  been  defined  whose  meaning  is  not  entirely- 
obvious,  and  these  are  explained  in  Figure  10. 

Two  contrasting  features  of  these  two  graphs  are  quite 
evi  dent . 

1.  The  construction  of  the  parallel  flowchart  from  the 
algorithm  is  quite  straightforward  in  contrast  to  the 
construction  of  the  Sutherland  Graph.   Determining  the 
control  and  data  lines  in  a  Sutherland  Graph  with  more 
than  a  few  operations  becomes  a  monumental  task. 

2.  An  even  more  apparent  contrast  is  the  ease  of  determining 
what  the  two  graphs  are  intended  to  represent  (accomplish). 
The  inherent  documentation  value  of  the  flowchart  is 

made  quite  evident. 

The  example  above  was  chosen  to  bring  to  light  the  advantages 
of  the  flowchart  technique  for  expressing  parallelism  in  programs  which 
are  not  trivial.   The  control  flow  remains  evident  and  relatively 
simple  in  the  flowchart  in  contrast  to  its  obf us  cation  in  the 
Sutherland  Graph  as  the  number  of  functions  increases. 

The  flowchart  of  Figure  7  brings  to  light  a  consideration  not 
previously  discussed.   After  process  symbol  (a)  is  executed  three 
parallel  paths  are  initiated,  and  two  conjunction  symbols  are  activated 
as  termination  points  for  these  paths.   Only  one  of  the  results 
calculated  at  (b)  and  (c)  is  used  in  later  calculations,  so  only  one 
conjunction  symbol  is  deactivated,  allowing  control  to  pass  through  it. 
The  other  conjunction  symbol  remains  active,  when  the  real  action  should 
be  that  it  become  nullified  without  allowing  control  to  pass  through  it. 
For  flexibility  in  expressing  parallelism,  the  ability  to  nullify  parallel 
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paths  as  a  result  of  calculations  is  a  feature  worth 
considering. 

D.   Conclusion 
In  the  near  future  explicit  parallelism  -will  probably 
enjoy  more  popularity  than  implicit  parallelism  for  the  following 
reasons . 

1.  The  extent  of  potential  parallelism  that  can  be 
recognized  at  the  compiler  level  is  restricted,  and 
the  compiler  algorithms  are  complicated  to  code. 

2.  The  efforts  to  re-introduce  parallelism  into 
algorithms  are  proving  fruitful.   The  need  for 
ways  to  communicate  this  parallelism  directly  to 
the  computer  is  becoming  evident. 

3.  Specifying  parallelism  explicitly  produces 
algorithms  which  express  the  highest  degree  of 
parallelism  possible  and  which  execute  in  the 
shortest  time  possible. 

The  flowchart  programming  system  is  a  proposed  extension 
to  an  online  graphical  method  of  representing  procedures  as  flowcharts. 
The  extension  will  add  the  capability  of  explicitly  specifying 
parallelism.   This  system  has  the  following  advantages : 

1.  The  flowchart  is  a  natural  medium  in  which  to  study 
and  design  the  control  flow  in  a  parallel  algorithm. 

2.  Coding,  and  documentation  are  prepared  and  updated 
in  one  medium. 
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3.   As  a  flowchart  is  executed  the  explicit  flow 

of  control  can  be  followed  by  requesting  that  the 
symbols  be  blinked  as  their  associated  texts  are 
executed.   This  is  a  useful  debugging  tool. 
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RETURN 


In  figures  7  and  8  the  process  symbols  immediately  following 
the  entry  process  symbols  contain  the  dummy  parameters  enclosed  in 
brackets  that  are  needed  by  these  flowchart  functions. 
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SWITCH  SYMBOL — Passes  through  the  data  value  of  one  of  the  input  lines. 
The  most  currently  passed  data  value  is  that  one  whose  input  line  most 
recently  became  defined  or  signaled  a  change  in  data  value.   Care  must 
he  taken  in  using  this  operator  to  assure  that  no  two  input  lines  could 
ever  change  simultaneously. 


PASS-THROUGH  SYMBOL — Passes  the  data  value  of  the  bottom  input  on  if  the 
top  input  signals  that  it  is  defined.   The  circle  marker  causes  the  top 
input  to  be  reset  to  the  undefined  state  when  a  signal  is  received. 


COMPARE  SYMBOL — Compares  the  data  values  of  the  top  and  bottom  input  lines 
If  top  >  bottom  then  the  top  output  becomes  defined. 
If  top  =  bottom  then  the  middle  output  becomes  defined. 
If  top  <  bottom  then  the  bottom  output  becomes  defined. 
The  data  value  of  the  output  lines  has  no  significance  so  care  must  be 
taken  in  using  these  output  lines  as  future  input  lines.   If  one  of  the 
output  lines  is  omitted  and  the  corresponding  condition  occurs  then  no 
signal  is  transmitted. 

Figure  10. 
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